Which method is optimal for estimating variance components and their variability in generalizability theory? evidence form a set of unified rules for bootstrap method

Objective The purpose of this study is to compare the performance of the four estimation methods (traditional method, jackknife method, bootstrap method, and MCMC method), find the optimal one, and make a set of unified rules for Bootstrap. Methods Based on four types of simulated data (normal, dichotomous, polytomous, and skewed data), this study estimates and compares the estimated variance components and their variability of the four estimation methods when using a p×i design in generalizability theory. The estimated variance components are vc.p, vc.i and vc.pi and the variability of estimated variance components are their estimated standard errors (SE(vc.p), SE(vc.i) and SE(vc.pi)) and confidence intervals (CI(vc.p), CI(vc.i) and CI(vc.pi)). Results For the normal data, all the four methods can accurately estimate the variance components and their variability. For the dichotomous data, the |RPB| of SE (vc.i) of traditional method is 128.5714, the |RPB| of SE (vc.i), SE (vc.pi) and CI (vc.i) of jackknife method are 42.8571, 43.6893 and 40.5000, which are larger than 25 and not accurate. For the polytomous data, the |RPB| of SE (vc.i) and CI (vc.i) of MCMC method are 59.6612 and 45.2500, which are larger than 25 and not accurate. For the skewed data, the |RPB| of SE (vc.p), SE (vc.i) and SE (vc. pi) of traditional method and MCMC method are over 25, which are not accurate. Only the bootstrap method can estimate variance components and their variability accurately across different data distribution. Nonetheless, the divide-and-conquer strategy must be used when adopting the bootstrap method. Conclusions The bootstrap method is optimal among the four methods and shows the cross-distribution superiority over the other three methods. However, a set of unified rules for the divide-and-conquer strategy need to be recommended for the bootstrap method, which is optimal when boot-p for p (person), boot-pi for i (item), and boot-i for pi (person × item).

recommended. However, these recommended unified rules have not been fully tested and were not used in different data types. Furthermore, previous literature has not indicated any unified rule for the confidence intervals of variance components' estimates [20,21]. Indeed, it is highly challenging to provide a set of effective unified rules for the estimation of the confidence intervals of variance components, especially in complex generalizability designs.
Generally, there are a lot of things that follow the normal distribution, such as height, weight, intelligence, etc. However, non-normal distribution data is common in psychological and educational measurement practice. For example, there are only two kinds of scores (wrong and right) for multiple-choice questions and yes no questions in some exams: 0 and 1, which are dichotomous distribution data. For example, in some psychological and educational tests, the rating has multiple data points and multiple scores exist such as a score of 0-4 [22]. The score can be divided into five points, namely 0, 1, 2, 3, and 4, which is a polytomous distribution data. For skewed distribution data, it is also common in practice because with the development of society, the application fields of psychological and educational measurement have undergone significant changes, and the knowledge and abilities of the tested group are no longer subject to normal distribution and are subject to skewed distribution to a certain extent [23]. These four types of data (i.e., normal, dichotomous, polytomous, and skewed data) are more commonly used in the practical application.
The present study aims to address this problem. It is designed to compare the performance of four estimation methods (i.e., traditional, jackknife, bootstrap and MCMC method) in estimating the standard errors and confidence intervals of estimated variance components with four types of data (i.e., normal, dichotomous, polytomous, and skewed data) in a p×i design and to propose a unified formulation for these estimations. Specifically, we try to figure out whether any one of the estimation methods has an advantage over the other methods. If an optimal method does exist, then we will explore whether there is a set of unified rules for estimated standard errors and confidence intervals of these estimates.
Three independent variables: (1) the first 4 represents four estimation methods (traditional method, jackknife method, bootstrap method and MCMC method); (2) the second 4 represents four distribution data (normal data, dichotomous data, polytomous data, and skewed data); (3) the 3 represents three measurement effects (person, item, and person×item).
Three dependent variables: variance components, standard errors of variance components (SE(vc)), and confidence intervals of variance components (CI(vc)).

Traditional method.
The traditional method to estimate standard errors of estimated variance components assumes that score effects have a multivariate normal distribution. Under this assumption, it can be shown that an estimator of the standard error of an estimated variance component isŝ Where M designates the model, α indexes an effect; β indexes the mean squares that enter ŝ 2 ðajMÞ, and f(β|α) is the coefficient of MS(β) in the linear combination of mean squares that givesŝ 2 ðajMÞ. The square of the right-hand side of Formula (1) is an unbiased estimator of the variance ofŝ 2 ðajMÞ [4]. When the score effect of large sample size is subject to the assumption of multivariate normal distribution for variance components, the following formula can be given to estimate the confidence interval [4]:ŝ Whereŝ 2 ðajMÞ is assumed to follow the normal distribution. The z represents the z score of the standard normal distribution (such as 1.96 or 2.58).ŝ½ŝ 2 ðajMÞ� represents the standard error ofŝ 2 ðajMÞ. Formula (2) is the traditional method's formula for estimating the confidence intervals of estimated variance components. 2.2.2 Jackknife method. Quenouille (1949) [24] suggested a nonparametric estimator of bias. Tudey (1958) [25] extend Quenouille's idea to a nonparametric estimator of the standard error of a statistic. The theory underlying the jackknife is discussed extensively by Li and Zhang (2012) [14]. Here, we briefly outline the basics of the theory and then discuss its application to estimated variance components for the design.
Suppose a set of S data points is used to estimate some parameter θ. The general steps in using the jackknife to estimate the standard error of θ are: 1. Obtainŷ for all S data points; 2. Obtain the S estimates of θ that result from deleting each one of the data points, and let each such estimate be designatedŷ À j .
4. Obtain the mean of the pseudovalusesŷ J , which is the jackknife estimator of θ; 5. Obtain the jackknife estimate of the standard error ofŷ: sðŷ J Þ ¼ ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi 1 sðs À 1Þ Which is the standard error of the mean for the pseudovaluses.
To establish a confidence interval using the jackknife, typically a distributional-form assumption is required. Usually, normality is assumed, and Student's t distribution is employed. Thus, a 100(1−α)% confidence interval for θ iŝ Where θ can be any one of the variance components and t is the (1−α)/2 percentile point of the t distribution with n p n i −1 degrees of freedom.

Bootstrap method.
The bootstrap is similar to the jackknife in that both are resampling methods and both are primarily nonparametric methods for assessing the accuracy of a particularŷ as an estimate of θ. A principal difference between the two methods is that the bootstrap employs sampling with replacement, whereas the jackknife employs sampling without replacement. Efron (1982) [26] provides an early theoretical treatment of the bootstrap.
For a statistic based on S observations, the bootstrap algorithm is based on multiple bootstrap samples, with each such sample consisting of a random sample of size S with replacement from the original sample. Using the bootstrap, estimation of the standard error of a statisticŷ involves these steps [27]: 1. Using a random number generator, independently draw a large number of bootstrap samples, say B of them; 2. For each sample, evaluate the statistic of interest, sayŷ b (b = 1, 2, . . .,B); 3. Calculate the sample standard deviation of theŷ b : ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi An appealing characteristic of the bootstrap algorithm is that it can be used almost automatically to obtain an approximate confidence interval, provided that the number of bootstrap sample is B � 1000 [12]. For example, a simple approach to obtaining an 80% approximate confidence interval for θ is to use the 10 th and 90 th percentile points of the distribution of the y b .

MCMC method.
The Markov chain Monte Carlo (MCMC) procedure is a method of simulating random samples from any theoretical distribution, especially from the multivariate posterior distribution to estimate features of the theoretical distribution [28]. The essential idea of MCMC is to define a Markov chain and to draw samples sequentially from the Markov chain. For Bayesian inference, the Markov chain is defined in such a way that the stationary distribution turns out to be the posterior distribution of interest. The draws form a Markov chain in that the distribution of the sampled draws depends only on the last value drawn. If the procedure works well, the approximate distributions are improved each iteration, which finally converge to the target distribution.
In generalizability theory, the linear model for a p×i design can be written as where μ refers to the grand mean; π p , β i , and ε pi refer to person effect, item effect and person×item effect (including residual effect) respectively. An observed score can be viewed as two parts. One part is a linear combination of the grand mean, the person effect and the item effect. The other part is the residual effect, which could be assumed to follow a normal distribution.
In Bayesian analysis, we can assign priors to the distributions of the person and item effects. If normal distributions are assumed for both effects, the model could be written as μ pi = μ+π p +β i , with priors In estimating the variability of estimated variance components, we are interested in the posterior means, posterior standard errors and credible sets forŝ 2 p ,ŝ 2 i , andŝ 2 pi . In order to obtain these, we need to specify priors forŝ 2 p ,ŝ 2 i , andŝ 2 pi , which are also called hyper priors. In this study, we refer to Mao et al. (2005) [29] practice, set p~τ (2,4), i~τ (2,16), pi~τ (2,64), and the initial value are 0.001.

Distribution data
Based on the p×i design in generalizability theory, Monte Carlo stimulation technique is used to generate four types of data with statistical software R (R 2.13.0), including normal data, dichotomous data, polytomous data, and skewed data. The simulation procedures are as follows.
2.3.1 Normal data. The procedure of simulating normally distributed data followed three steps [4]. First of all, the following formula X pi ¼ m þ ðm p À mÞ þ ðm i À mÞ þ ðX pi À m p À m i þ mÞ was transformed into X pi ¼ m þ s p z p þ s i z i þ s pi z pi . Secondly, within R, the rnorm function was called to randomly generate z p , z i and z pi that obeyed the normal distribution when the parameters σ p , σ i and σ pi were specified as 2, 4, and 8 (usually μ was set as 0). Last but not least, simulated data X pi was obtained. The number of simulations was 1000. Thus, 1000 batches of 100×20 simulated normal data were generated.

Dichotomous data.
As soon as the simulated normal data X pi had been obtained (see above), Y pi was judged. If X pi � 0, then Y pi = 1, otherwise Y pi = 0. In this study, Y pi obeyed Bernoulli distribution and the success probability of Y pi was 0.5, which was PðY pi ¼ 0Þ ¼ 1 À PðX pi � 0Þ ¼ 0:5. The simulation process was conducted for 1000 times, generating 1000 batches of 100×20 dichotomous data.

Skewed data.
It took three steps to simulate skewed data. In step one, the generalized hyperbolic distribution was defined, which was referred to as GH distribution. The density function of GH distribution [31,32] The density function of GH distribution can also be formulated as: The properties of GH distribution were mainly determined by five parameters α, β, μ, λ and δ. Among these parameters, α and β indicated the kurtosis and skewness of the distribution respectively; μ and δ indicated the position and shape of density function; and λ indicated the tail thickness of the distribution. In step two, by calling the ryperb function in R, three groups of skewed data were generated for the p×i design. The parameters of GH distribution were controlled by setting α = 1, μ = 0, δ = 1, α = 3; β was free and could be set at -2, -1, 0, 1, and 2. The result of β was symmetrical, and β was only set at -2, -1, and 0. The skewed data under certain of skewness were generated by using the following equation: X pi ¼ m þ GHðpÞ þ GHðiÞ þ GHðpiÞ (usually μ set as 0). In step three, for certain skewness, the simulated skewed data were matrix data (p×i), and the number of simulations was 1000. Thus, 1000 batches of 100×20 skewed data were simulated. There were three skewness values (-2, -1, and 0), which could generate 3×1000 batches of 100×20 skewed data.

Measurement effect
In this paper, three measurement effects are considered, such as person effect (p effect), item (i effect), and person×item (pi,e effect). The effect of person×item includes residual effects that are considered as some random and inseparable effects [33].

Comparison standard
Following the recommendations of Diallo et al. (2017) [34], this study uses the Relative Percentage Bias (RPB) as the standard of comparison in estimating the variance components and their variability. The Bias and RPB are formulated as: Where the RPB is the relative percentage deviation,ŷ is the estimated value of the variance component and the variability, and θ is the parameter value of the variance component and the variability. A smaller absolute value of the relative percentile deviation (|RPB|) indicates a smaller difference between the estimated value and the parameter value, and that the result would be more reliable, and vice versa.
Following the recommendations of Tong and Brennan (2007) [13], the following decision rules are determined: (1) If |RPB| < 25, the deviation is relatively small and the estimation is considered as accurate and reliable; (2) If |RPB| � 25, the deviation is relatively large and the result is considered as inaccurate and unreliable.

Analytical tools
Several statistical programs, including R, WinBUGS, R2WinBUGS, and CODA, were used in the current study. The analyses were completed without interruptions.

Estimating variance components and variability
Tables 1 and 2 shows the estimates of variance components and corresponding variability based on the p×i design. This variability refers to the standard errors and confidence intervals. The first column displays four types of data, including normal, dichotomous, polytomous and skewed data (low skew, β = 0; medium skew, β = -1; high skew, β = -2). The second column displays four methods used, including the traditional method, jackknife method, bootstrap method, and Markov Chain Monte Carlo method. For the bootstrap method, six resampling strategies were considered, including boot-pi, boot-pir, boot-ir, boot-i, boot-pr, and boot-p.
The third column displays estimated variance components for person. The fourth column displays the standard errors of estimated variance components for person. The fifth column displays the confidence intervals of estimated variance components for person and so on. It is worth noting that in Tables 1 and 2. The parameter values are given in the first row corresponding to each distribution.
When only the sampling of p (person) is considered, boot-p represents fixed i (item) and r (residual); when only the sampling of i is considered, boot-i represents fixed p and r; when only the sampling of p and i are considered, boot-pi represents fixed r; when only the sampling of p and r are considered, boot-pr represents fixed i; when the sampling of i and r are considered, boot-ir represents fixed p. When the sampling of p, i and r are considered, boot-pir represents the sampling of p, i, and r, simultaneously. Parameter is the parameter value.
The abbreviations of "vc.p", "vc.i", and "vc.pi" are short for variance components of the person, the item, and the interaction between the person and the item (including the residual) respectively. SE(vc.p), SE(vc.i), and SE(vc.pi) are their corresponding standard errors. CI (vc. p), CI (vc.i), and CI (vc.pi) represent the 80% confidence intervals of the estimates (80% CI).
The parameter values from Tables 1 and 2 were converted using the RPB conversion formula (i.e., Formula (13)) and displayed in Tables 3 and 4. For example, for Table 3, How to obtain 0.7100 for traditional method in normal data? First of all, we should see Table 1 and find 4.0284 in column vc.p, and also see the corresponding parameter 4.0000 in Table 1. Secondly, we use Formula (13) to compute as follows: RPB ¼ŷ À y y � 100 ¼ 4:0284 À 4:0000 4:0000 ¼ 0:007100 � 100 ¼ 0:7100 The 0.7100 is in Table 3   results, regardless of which one of the six resampling strategies is adopted (see the italic bold values; |RPB| < 25).

Comparing the performance of estimation methods with different types of data
Based on the standard of comparison and decision rules, the performance of these methods under different data conditions is graded and showed in Table 5, with the "+" symbol meaning accurate and the "-" symbol meaning inaccurate. As shown in Table 5  are accurate, and the standard errors and the 80% CI of estimated variance components contain a total of 2 to 6 resampling strategies. Furthermore, regardless of data types, the bootstrap method can always produce accurate estimated variance components and their variability with a certain resampling strategy. In sum, the bootstrap method has shown the best estimation performance when applied to all four types of data in estimating variance components and the variability (no "-" was produced). The jackknife method is better (with two "-") than the traditional and the MCMC methods (both with four "-"). The results show that the traditional method produced a larger total error (|RPB| = 734.3263) than the MCMC method (|RPB| = 633.1782). Therefore, compared with the traditional method, the MCMC method is a better option to estimate the variability of estimated variance components.

Divide-and-conquer strategy of the bootstrap method with different data
Although the bootstrap method is optimal for estimating the variability of estimated variance components, it requires a divide-and-conquer strategy [13]. That is, the resampling strategies of the bootstrap method should be chosen appropriately based on specific variability of the variance components. We have tested the divide-and-conquer strategy (based on Table 4) for different standard errors and confidence intervals of the variance components with different distributions of data, and results are summarized in Table 6. For normal data, SE (vc.p) can be estimated with boot-p and boot-ir (|RPB| < 25; 1.8664 and 1.9150, respectively; Table 4); bootpi, boot-ir, boot-pir, and boot-i can be used to estimate SE (vc.i) (all |RPB| < 25); SE (vc.pi) can be estimated with boot-ir, boot-pir, boot-pr, boot-p and boot-i (all |RPB| < 25). In the case of normally distributed data, all six bootstrap strategies can be used to estimate CI (vc.p).
With respect to the 80% coverage of confidence intervals, |RPB| of boot-ir, boot-p, boot-i, boot-pir, boot-pr, and boot-pi are all less than 25 (1.2500, 3.2500, 13.0000, 14.3750, 15.0000, and 23.0000 respectively). CI (vc.i) can be estimated with boot-pi, boot-ir, boot-pir, and boot-i (all |RPB| < 25). All six strategies can be chosen for the estimation of CI (vc.pi). See Table 6 and Table 6 to review the bootstrap strategies for the estimation of standard errors and confidence intervals of the estimates under other distribution data.

The cross-distribution superiority of the bootstrap method
As showed in Table 5, for the normal data, all four methods can accurately estimate the variance components, the standard errors of variance components, and the confidence intervals of variance components ("+"). Specifically, when using the bootstrap method, the divide-andconquer strategy should be adopted.
For dichotomous data, all four methods perform well in terms of the estimation of the variance components. However, when estimating the standard errors of variance components, only one method, the bootstrap method using the divide-and-conquer strategy can produce accurate outcomes. Both the traditional method and the MCMC method overestimate the standard errors of the variance components of i (by 128.5714% and 28.5714% respectively), and the jackknife method underestimates the standard errors of the variance components of i and pi (by 42.8571% and 43.6893% respectively). Regarding the estimation of the confidence intervals of variance components, both the MCMC method and the bootstrap method (using the divide-and-conquer strategy) produce accurate outcomes. The traditional method performs relatively accurately as well. Nonetheless, the jackknife method underestimates the 80% coverage of the confidence intervals of variance components of i and pi (by 40.5000% and 35.7500%).
For polytomous data, all four methods can estimate the variance components accurately. When estimating the standard errors of the variance components, the jackknife method and the bootstrap method (when adopting the divide-and-conquer strategy) can estimate the standard errors accurately, while the traditional method and the MCMC method underestimate the standard errors of the variance components of p (by 26.1152% and 26.0223% respectively) and i (by 60.2965% and 59.6612% respectively). In the estimation of the confidence intervals of variance components, both the jackknife method and the bootstrap method (with the divideand-conquer strategy) have good performance whereas the traditional method and the MCMC method underestimate the 80% coverage of the confidence intervals of variance components of i (by 51.6250% and 45.2500%, respectively).
For skewed data, in the cases of low skewness (β = 0) and medium skewness (β = -1), all four methods can accurately estimate the variance components, the standard errors of variance components, and the confidence intervals of variance components. Similar to previous results, the bootstrap method performs well when the divide-and-conquer strategy is used. In the case of high skewness (β = -2), all methods are accurate in estimating the variance components. In terms of estimating the standard errors of variance components, the jackknife method and the bootstrap method (when using the divide-and-conquer strategy) perform accurately, but the traditional method and the MCMC method underestimate the standard errors of the variance components of p (by 33.2794% and 33.4413%), i (31.7712% and 30.4428%) and pi (33.2099% and 34.5679%). As for the estimation of the confidence intervals of variance components, all methods yield accurate outcomes. Again, the bootstrap method can be accurate when the divide-and-conquer strategy is used.
In sum, firstly, all four methods can accurately estimate these three variance components (i.e., vc.p, vc.i, vc.pi). This lays the basis for estimating and comparing the performance of the corresponding variability of estimated variance components using various methods. Secondly, all four estimation methods have an impact on the variability of estimated variance components, and their performance depends on which type of data is applied to. Specifically, for normal data, all four methods are acceptable. For dichotomous data, the bootstrap method is the best option, as the other three methods perform poorly. For polytomous data, the bootstrap method and jackknife method have better performance than the traditional method and MCMC method. Likewise, for skewed data, the bootstrap method and jackknife methods are superior to the traditional method and MCMC method. Thirdly, there are differences in the overall performance of these estimation methods in estimating the variability of the variance components. The bootstrap method is the best option, followed by the jackknife method, the MCMC method, and the traditional method. Finally, only the bootstrap method can accurately estimate the variability of the variance components for all four types of data, showing superiority over the other methods across different data conditions. It should be noted that when using the bootstrap method, the divide-and conquer strategy should be chosen.

The unified rule of the bootstrap method
Since the bootstrap method is the only method that performs well with different types of data in the study, we argue that it provides convenience for estimating the variability of estimated variance components. When using a divide-and-conquer strategy, the bootstrap method produces accurate results regardless of the data type. However, one problem remains to be solved. Does the divide-and-conquer strategy vary according to the distributions of data? In other words, does a set of unified rules exist? 4.2.1 The unified rule for estimating the standard errors of variance components using the bootstrap method. In terms of the standard errors of vc.p (see Table 6), only boot-p and boot-ir strategies can be selected for normal data and dichotomous data. However, boot-ir does not appear in other types of data. Only boot-p has "spanning" in all four types of data and shows good performance. Thus, the boot-p strategy is the best estimation strategy for the standard errors of vc.p. For the standard errors of vc.i, the boot-pi strategy performs best, except for its performance with dichotomous data (ranked second). It is obvious that the boot-pi strategy is the best estimation strategy for the standard errors of vc.i. For the standard errors of vc.pi, only the boot-i and boot-pi strategies can be selected for dichotomous data. However, boot-pi only appears in the dichotomous data. Only boot-i can be used uniquely. The boot-i has "spanning" in all four types of data. As a result, the boot-i strategy is the best estimation strategy for the standard errors of vc.pi. To sum up, the boot-p strategy is optimal for estimating the standard errors of vc.p, the boot-pi strategy is optimal for i, and the boot-i strategy is optimal for pi. It should be noted that the unified rules for estimating the standard errors of variance components of the bootstrap method in this study are of great significance and can provide guidance for future use of the bootstrap method [13].

The unified rule for estimating the confidence intervals of the variance components using the bootstrap method.
In terms of the confidence intervals of vc.p (see Table 6), four strategies of boot-p, boot-pi, boot-pr, and boot-pir are applied to four types of data. According to the absolute "relative percent deviations" (|RPB| = 21.8750, 58.0000, 42.6250, and 41.5000, respectively, for the boot-p, boot-pi, boot-pr, and boot-pir strategies), the boot-p strategy has the smallest deviation, and it is the best estimation strategy for the confidence intervals of vc.p. For the confidence intervals of vc.i, four strategies (i.e., boot-pi, boot-i, bootpir, and boot-ir) also appear under all types of data. The absolute "relative percent deviations" of boot-pi, boot-i, boot-pir, and boot-ir strategies are 54.8750, 59.7500, 60.7500, and 53.2500, respectively. The deviation of the boot-pi strategy is relatively small (i.e., 54.8750), and there is no significant difference from the boot-ir strategy, which has a minimum |RPB| (i.e., 53.2500). It can be considered that the boot-pi strategy is the best estimation strategy for the confidence intervals of vc.i. For the confidence intervals of vc.pi, only the boot-i and boot-pi strategies can be selected for the dichotomous distribution data. The absolute "relative percent deviations" of boot-i and boot-pi strategies are 10.5000 and 115.3750, respectively. The deviation of the booti strategy is smaller, and the boot-i strategy is the best estimation strategy for the confidence intervals of vc.pi. In summary, the boot-p strategy is optimal for the estimation of the confidence intervals of the variance components of p, the boot-pi strategy is the best for i, and the boot-i strategy performs best for pi. That is, for p, boot-p is the best; for i, boot-pi is the best; and for pi, boot-i is the best.

Limitations
The present study has several limitations. First of all, regarding the data simulation process, the skewed data were simulated without taking various kurtoses, dispersion, or tail thickness into account [31]. In addition, the sample size was fixed at 100×20. Other sample sizes, such as 30×5, 30×20, 600×5, 600×20, 100×40 and 100×80, can be used in future study. Secondly, only the p×i design was investigated in this study. Other designs such as i:p, i:h:p, p×i×h, p×(i:h), and i:(p×h) should be examined in future research. Thirdly, this study used the Bias and RPB as the standard of comparison in estimating the variability of estimated variance components. Future study can also include the root mean square error (RMSE) [35] as a standard of comparison. Last but not least, the estimation of D study generalizability coefficients, such as indices of dependability and signal-noise ratios (S-N) for absolute decisions, should be assessed in future study as well.

Conclusions
In this study, we examined the performance of four methods (i.e., traditional, jackknife, bootstrap, and MCMC) in estimating the variability of estimated variance components with four types of data (i.e., normal, dichotomous, polytomous, and skewed data) and found that these methods have different performance under different conditions. The bootstrap method is the only one that can accurately estimate the variability of variance components with all four types of data, showing cross-distribution superiority over the other methods. When using the bootstrap method, the divide-and-conquer strategy should be used and there is a set of unified rules for this strategy. Specifically, the boot-p strategy is optimal for estimating the variance components and the variability of p (person), the boot-pi strategy is optimal for estimating the variance components and the variability of i (item), and the boot-i strategy is optimal for estimating the variance components and the variability of pi (person × item). That is, boot-p for vc.p, SE(vc.p) and CI(vc.p); boot-pi for vc.i, SE(vc.i) and CI(vc.i); and boot-i for vi.pi, SE(vc. pi) and CI(vc.pi).